2021 VIS Area Curation Committee Executive Summary
Summary
We use submission and bidding information from VIS 2021 to analyze the impact of moving to an area model. Given the information we have access to, the move appears to be broadly successful, and we only make small recommendations on example descriptions of areas, and keywords to change. Our analysis suggests that submissions are relatively balanced across areas, keywords are (with a small exception) well distributed, and the unified PC appears to provide broad and overlapping coverage.
Committee members: Alex Endert (chair), Steven Drucker, Issei Fujishiro, Christoph Garth, Heidi Lam, Heike Leitte, Carlos Scheidegger, Hendrik Strobelt, Penny Rheingans.
This report summarizes the process, findings, and recommendations by the VIS Area Curation Committee (ACC) regarding the areas and keywords used for paper submissions to IEEE VIS 2021. According to the Charter, the goal of this committee is to analyze and report how submissions made use of the areas and keywords to describe their contribution. It is important to understand when these descriptors no longer adequately cover the breadth of research presented at VIS.
This report is generated by members of the ACC for the current year, and prepared for the VSC. Upon review, it will be linked from the IEEE VIS website. The conclusions and discussion points are based on submission and reviewer data from IEEE VIS 2021. The report and analysis performed is focused on the use of keywords, areas, and reviewer matching. Thus, there are likely other aspects of conference organization which are not covered (but could be considered).
The report is broken down into the following sections. First, the data and analysis process is described. It shows which data we used, where it is stored, and how it is obtained. These processes can be adapted for future years of this committee. Second, a discussion of key findings from our analysis. These are only highlights, with the complete analyses linked. Finally, it includes a collection of recommendations and noteworthy findings which should be “watched” next year to see if trends emerge.
Data and Process
The data used to perform this analysis is a combination of paper submission data and reviewer bidding data. Both sets were anonymized to minimize the ability to identify IPC members, authors, or reviewers. The scripts used to export the data from PCS and anonymize it can be found here.
The analysis of the data in this year uses the anonymized CSV files obtained directly from PCS. You can see the source code used to process and generate the plots in this document by clicking on the “Code” buttons, which will fold out the Python code used.
In order to facilitate longitudinal studies of this data, we are also providing a sqlite database with the 2021 data in an attempt to make it easier to incorporate 2022 data and so on. The code that generates this database can be found here.
Data Highlights
We analyzed anonymized data containing information about the full paper submissions to VIS 2021, the reviews of these submissions, and the IPC bidding preferences. We analyzed this data to understand how well the areas and keywords characterize the body of work submitted this year. We also analyzed the IPC bidding information to understand how well the expertise of the IPC members covers the submissions. Below, we show highlights of our findings.
Note that in the the analysis that follows, the submission/paper IDs and reviewer IDs are anonymized through a randomizer, and are not the IDs used in PCS submissions and reviewers.
Submissions per Area. We wanted to understand how submissions were distributed by area, including acceptance decisions. Submissions to each area were within reasonable upper and lower limits, and decisions did not appear partial to any individual area.
Code
tmp = (submissions .value_counts(['Area', 'Decision']) .reset_index() .rename(columns = {0: 'count'}))fig = px.bar(tmp, x ='count', y ='Area', barmode ='stack', orientation ='h', color ='Decision', text ='count', custom_data = ['Decision'],).update_layout( title ='Submissions by area', xaxis_title ='Number of Submissions',**aspect(0.35)).update_traces( hovertemplate ='%{x} submissions in %{y} have decision %{customdata[0]}<extra></extra>',).show(config=config)
Keywords Used. We also analyzed the frequency of how often keywords were used in the submissions. The frequency of keywords used is reasonable. The one exception which should be watched for next year is “Application”, which may require further specification or description.
Code
# do a manual histogram to include non-specified keywordspx.bar(k_total, x ='Short Name', y =0, color ='Category',# color_discrete_map=keyword_category_colors,).update_traces( hovertemplate ="'%{x}' specified in %{y} submissions<extra></extra>",).update_layout( xaxis_tickfont_size =8, xaxis_dtick =1, yaxis_dtick =20, xaxis_title ='Keyword', yaxis_title ='Number of Submissions', legend_title ='Keyword Category', hovermode ='closest', title ='Frequency of keywords across submissions',**aspect(0.4)).show(config=config)
Unified PC Expertise. Finally, we want to highlight that bidding information from the PC members indicate that moving to a unified PC appears to provide ample coverage of expertise for the papers submitted.
Code
tmp = (bids .value_counts(['Paper ID', 'Bid'], sort=False) .reset_index() .loc[lambda x: x.Bid.isin(['want', 'willing'])] .rename(columns = {0: 'Number of Bids'}))px.bar(tmp, x ='Paper ID', y ='Number of Bids', color ='Bid').update_layout( xaxis_type ='category', xaxis_categoryorder ='total descending', xaxis_showticklabels =False, title ='Positive Bids per Paper',**aspect(0.4),).update_traces( hovertemplate ='Paper %{x} received %{y} "%{fullData.name}" bids.<extra></extra>',).show(config=config)
Cross-area reviewing among the unified PC. One concern for the unified PC was that PC members would be isolated to their respective areas, further increasing the fragmentation of the community. From the number of areas each PC member reviewed in, however, this does not appear to be the case:
Code
tmp = (assignments .merge(submissions, on='Paper ID') .groupby('Reviewer') .apply(lambda x: len(x['Area'].unique())) .reset_index())px.histogram(tmp, x =0,).update_traces( hovertemplate ='%{y} PC members were assigned submissions from %{x} area(s)',).update_layout( bargap =.1, xaxis_title ='Number of Areas', yaxis_title ='Number of PC members',**aspect(0.4),).show(config=config)
Recommendations
The ACC has the following recommendations with regards to Areas, Keywords, and Bidding for VIS 2022. We also have a list of “watchlist items” that we recommend keeping under observation for future years. We do not find these definitive enough to recommend changes, but they did create considerable discussion among the ACC and should be re-visited next year.
Areas
After reviewing the paper areas, we suggest some small changes to the descriptions and example papers. At this time, we do not recommend retiring areas or adding new ones. Specific changes recommended to example papers are:
We moved the following paper to Area 4 (Representation and Interaction):
A. Srinivasan and J. Stasko. “Orko: Facilitating Multimodal Interaction for Visual Network Exploration and Analysis.” IEEE Transactions on Visualization and Computer Graphics, 24(1): 2018.
We added the following paper to Area 5 (Data Transformation):
H. Strobelt, D. Oelke, C. Rohrdantz, A. Stoffel, D. A. Keim and O. Deussen “Document Cards: A Top Trumps Visualization for Documents.” IEEE Transactions on Visualization and Computer Graphics, vol. 15, no. 6, pp. 1145-1152, 2009.
Keywords
Based on our analysis of keyword frequencies and feedback we do not recommend removing keywords at this time, due primarily to the short review period so far. We recommend the following additions to keywords and keyword descriptions:
To the Data Types category: Sets, Ensemble models
To the Contribution (General) category: Ontology
To the Application Area category: Chemistry (to Life Sciences….), Astronomy (to Physical & Environmental…), Law, Economics, and Social Media (to Social Science…)
In the Topic category, Stats & Math to become Stats, Math & ML
Graph visualization/analysis and Glyph-based techniques (to General Visualization….)
We put forward the following observations and questions for continued consideration:
The keyword Application Motivated Visualization may be oversubscribed, appearing in about 25% of the submissions (110/434 overall, 30/110 of accepted papers, 80/324 of rejected papers). In contrast, the next most popular keywords Data Analysis, Reasoning, Problem Solving, and Decision Making and Machine Learning appear in 90 and 73 submissions each. This calls for further examination. Specifically, if we wish to split or clarify this keyword, what data do we base that revision on? We recommend analyzing this again next year and seeing if it continues to be used too frequently.
We discussed the issue of whether keywords could be used to indicate specific domain knowledge required to appropriately review some submissions and how that specific knowledge should be signalled by the author and used in the assignment process. One example is a potential keyword for sports visualization (appearing multiple times in the keyword feedback as a requested new keyword). If such a keyword were created and used in assignments, would reviewer expertise in cricket (for example) actually be useful in reviewing a paper about tennis? Specifically, there is an issue with the level of granularity of expertise in this domain (and presumably in other domains). This calls for further thought, hence we do not recommend adding a keyword for sports visualization at this time. We also observe that there is currently no way for reviewers to indicate expertise in a specific domain (for instance, tennis).
We suggest that decisions about adding keywords consider the cost of such additions in the submission, bidding, and other processes.
General Observations and Reflection
Based on the process this year, we have the following observations and reflections on various aspects of this committee and the overarching goals.
Data Collection: Submission, review, and bidding data is required for the analysis we performed. We have created a script to anonymize the PCS exports. However, there are challenges with OPCs having conflicts with papers, so the exports may not be complete. Further, the timing of generating this data conflates an already busy time for the paper chairs. In the future, perhaps having a separate committee (or person) generate these data exports may be preferred to offload some work from the paper chairs. One recommendation discussed was to have the person who manages PCS do it (with permission from the OPCs).
IPC Participation Data Usage: In the future, VIS should ask IPC members to acknowledge that their bidding data will be used by the ACC for operational improvements to VIS.
Match Score Transparency: We recommend VIS produce documentation on how the “match score” is computed for the IPC. We recommend this because we currently have no technical basis on which to evaluate the quality of the current matches; if the match score cannot be adequately explained or otherwise justified, we recommend moving to replace the current matching system with one that can be explained.
Full Analysis
(NB: Some of the plots shown above are repeated here for the sake of completeness.)
Submissions
How many papers were submitted to each area, and what is the breakdown of decisions?
Code
tmp = (submissions .value_counts(['Area', 'Decision']) .reset_index() .rename(columns = {0: 'count'}))fig = px.bar(tmp, x ='count', y ='Area', barmode ='stack', orientation ='h', color ='Decision', text ='count', custom_data = ['Decision'],).update_layout( title ='Submissions by area', xaxis_title ='Number of Submissions',**aspect(0.35)).update_traces( hovertemplate ='%{x} submissions in %{y} have decision %{customdata[0]}<extra></extra>',).show(config=config)
Keywords
How often was a particular keyword specified?
Code
tc = [ dict(n=c, p='All', f=c) for c in k_cnt['Category'].unique() ]ts = [ dict(n=s, p=c, f=c) for _, c, s in k_cnt[['Category', 'Subcategory']].drop_duplicates().itertuples() if c != s ]tl = [ dict(n=r['Short Name'], p=r.Category if r.Category == r.Subcategory else r.Subcategory, c=r.c, f=r.Category) for _, r in k_cnt.iterrows() ]tree = pd.DataFrame(tc + ts + tl).fillna(0)px.treemap(tree, names = tree.n, parents = tree.p, values = tree.c, color = tree.f,# color_discrete_map=keyword_category_colors,).update_layout( margin = {'t': 0, 'b': 0, 'l': 0, 'r': 0}, uniformtext=dict(minsize=10),**aspect(0.4)).update_traces( hovertemplate ="'%{label}' specified in %{value} submissions<extra></extra>", marker_depthfade ='reversed',).show(config=config)
Count of keywords
Code
# do a manual histogram to include non-specified keywordspx.bar(k_total, x ='Short Name', y =0, color ='Category',# color_discrete_map=keyword_category_colors,).update_traces( hovertemplate ="'%{x}' specified in %{y} submissions<extra></extra>",).update_layout( xaxis_tickfont_size =8, xaxis_dtick =1, yaxis_dtick =20, xaxis_title ='Keyword', yaxis_title ='Number of Submissions', legend_title ='Keyword Category', hovermode ='closest', title ='Frequency of keywords across submissions',**aspect(0.4)).show(config=config)
How are keywords distributed across areas?
Code
# do a manual histogram to include non-specified keywordsk_cnt = keywords.merge( pd.DataFrame(areas.values(), columns = ['Area']), how ='cross').merge( k_all .value_counts(['Short Name', 'Area']) .reset_index(), how ='outer').fillna(1e-10) # needed for sorting, Plotly bug?px.bar(k_cnt, x ='Short Name', y =0, color ='Area', custom_data = ['Area']).update_traces( hovertemplate ='Keyword "%{x}" specified by %{y} submissions from area "%{customdata}"<extra></extra>').update_layout( barmode ='stack', yaxis_title ='Number of Submissions', xaxis_dtick =1, xaxis_tickfont_size =8, xaxis_fixedrange =True, yaxis_fixedrange =True, xaxis_categoryorder ='total descending', title ='Frequency of keywords across submissions, by area', legend_title ='Area',**aspect(0.4)).show(config=config)
How many submissions specified a given number of keywords?
Code
submissions['Number of Keywords'] = submissions['Keywords'].apply(lambda kw: len(kw.split('; '))).sort_values()tmp = submissions.value_counts(['Number of Keywords', 'Area']).reset_index()px.bar(tmp, x ='Number of Keywords', y =0, barmode ='stack', color ='Area', custom_data=['Area'], labels = { '0': "Number of Submissions" },).update_traces( hovertemplate ='%{y} submissions specified %{x} keywords in area "%{customdata}"<extra></extra>',).update_layout( xaxis_dtick =1, title ='Keyword count per submission',**aspect(0.4)).show(config=config)
Does keyword count correlate with decision?
Code
# TODO: group 10+ togethertmp = (submissions .assign(**{'Number of Keywords': submissions['Number of Keywords'] .map(lambda x: str(x) if x <10else'>=10') }) .value_counts(['Number of Keywords', 'Decision']) .groupby(level=0) .apply(lambda g: pd.DataFrame({0: g, 1: g/g.sum(), 2:g.sum()})) .reset_index())px.bar(tmp, x ='Number of Keywords', y =0, barmode ='stack', color ='Decision', custom_data=['Decision', 0, 2], labels = { '0': "Number of Submissions" },).update_traces( hovertemplate ='%{customdata[1]} (%{y}) of %{customdata[2]} submissions with %{x} keywords had decision "%{customdata[0]}"<extra></extra>',).update_layout( xaxis_dtick =1, xaxis_type ='category', xaxis_categoryorder ='category ascending', yaxis_title ='Submissions', title ='Decisions by keyword count',**aspect(0.3)).show(config=config)
Do specific keywords correlate with decision?
Code
# do a manual histogram to include non-specified keywordsk_dec = (k_all .groupby(['Short Name', 'Decision']) .size() .groupby(level =0) .apply(lambda g: pd.DataFrame({0: g, 1: 100*g/g.sum(), 2:g.sum()})) .reset_index())px.bar(k_dec, x ='Short Name', y =0, color ='Decision', custom_data = ['Decision', 1, 2],).update_layout( xaxis_title ='Keyword', yaxis_title ='', xaxis_dtick =1, xaxis_tickfont_size =8, title ='Decision by presence of keyword',**aspect(0.4)).update_traces( hovertemplate ="%{y} of %{customdata[2]} submissions (%{customdata[1]:.2f}%) specifying keyword '%{x}' had decision '%{customdata[0]}<extra></extra>",).show(config=config)
How often are keywords “esoteric”, i.e. used alone?
Code
tmp = (k_all.set_index('Paper ID').merge(submissions) .value_counts(['Short Name', 'Category', 'Number of Keywords']) .reset_index() .assign(**{'Number of Co-Keywords': (lambda x: x['Number of Keywords']-1)}))px.box(tmp, x ='Short Name', y ='Number of Co-Keywords', color ='Category',# color_discrete_map=keyword_category_colors,).update_layout( xaxis_dtick =1, xaxis_tickfont_size =8,**aspect(0.4)).update_traces( width =.5, line_width =1,).show(config=config)
How often are pairs of keywords specified together?
Code
k_pairs = (k_all .groupby('Paper ID') .apply(lambda g: pd.DataFrame(itertools.combinations(g['Short Name'].values, 2))) .join(submissions['Decision']))tmp = k_pairs.groupby([0,1]).size().nlargest(40)tmp = ( k_pairs .set_index([0,1]) .loc[tmp.index] .assign(p=lambda df: [' + '.join(v) for v in df.index.values]) .value_counts(['p', 'Decision'], sort=False) .rename('c') .reset_index())px.bar(tmp, x ='p', y ='c', color ='Decision', custom_data = ['Decision'],).update_layout( xaxis_title ='Keyword Pair', yaxis_title ='Submissions', xaxis_dtick =1, xaxis_categoryorder ='total descending', xaxis_tickfont_size =8, title ='Top 40 keyword pairs',**aspect(0.4)).update_traces( hovertemplate ='%{y} submissions with keyword pair "%{x}" had decision "%{customdata[0]}"<extra></extra>',).show(config=config)